Data communication and data science
GEOG 30323
November 27, 2018
Course recap
- Thus far: we’ve focused on exploratory data analysis, which involves data wrangling, summarization, and visualization
- Your data analysis journey shouldn’t stop here! Topics to consider:
- Explanatory vs. exploratory visualization
- Statistics and data science
- Data ethics and “big data” (next week)
Communicating with data
- Once you’ve done all of the hard work wrangling your data, you’ll want to communicate insights to others!
- This might include:
- Polished data products or reports
- Models that can scale your insights
Explanatory visualization
- We’ve largely worked to this point with exploratory visualization, which refers to internally-facing visualizations that help us reveal insights about our data
- Often, externally-facing data products will include explanatory visualization, which include a polished design and emphasize one or two key points
Interactive reports
- Example: a data journalism article - or your Jupyter Notebook!
- Key distinction: your code, data exploration, etc. will likely be external to the report (this can vary depending on the context, however)
Tableau

- Highly popular software for data visualization - both exploratory and explanatory
- Intuitive, drag-and-drop interface
- Key feature: the dashboard
Infographics
Obesity infographics:
Data Science
- Data science: new(ish) field that has emerged to address the challenges of working with modern data
- Fuses statistics, computer science, visualization, graphic design, and the humanities/social sciences/natural sciences…
The data analysis process

Visualization vs. modeling
Hadley Wickham (paraphrased):
Visualization can surprise you, but it doesn’t scale well. Modeling scales well, but it doesn’t (fundamentally) surprise.
Statistical modeling
- What is the mathematical relationship between an outcome variable \(Y\) and one or more other “predictor” variables \(X_{1}...X_{n}\)?
- Recall our use of
lmplot in seaborn - lm stands for linear model
Statistical modeling
The linear model:
\[ Y = Xb + e \]
where \(Y\) represents the outcome variable, \(X\) is a matrix of predictors, \(b\) represents the “parameters”, and \(e\) represents the errors, or “residuals”
- Linear models will not always be appropriate for modeling relationships between variables!
Statistics in Python
- Substantial statistical functionality available in the
statsmodels package, which installs with Anaconda
Statistics in Python
Let’s get an example ready:
Residuals and fitted values
Residuals and fitted values
Machine learning
- “The science of getting computers to act without being explicitly programmed”
- Types of machine learning algorithms: supervised and unsupervised
- Topics in machine learning: classification, clustering, regression
Visual introduction to machine learning: http://www.r2d3.us/visual-intro-to-machine-learning-part-1/
Example: K-means clustering
Example: K-means clustering
Example: nearest-neighbor search
Example: nearest-neighbor search
Making predictions


How to learn more
- Take statistics and machine learning courses here at TCU!
- Check out DataCamp for hundreds of courses on data science in Python and R